AITopics | nl intent

Collaborating Authors

nl intent

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grounding Data Science Code Generation with Input-Output Specifications

Wen, Yeming, Yin, Pengcheng, Shi, Kensen, Michalewski, Henryk, Chaudhuri, Swarat, Polozov, Alex

arXiv.org Artificial IntelligenceFeb-12-2024

Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems, requiring additional input-output (I/O) specifications. Unfortunately, LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O specification. In this paper, we give a way to mitigate this issue in the context of data science programming, where tasks require explicit I/O specifications for clarity. Specifically, we propose GIFT4Code, a novel approach for the instruction fine-tuning of LLMs with respect to I/O specifications. Our method leverages synthetic data produced by the LLM itself and utilizes execution-derived feedback as a key learning signal. This feedback, in the form of program I/O specifications, is provided to the LLM to facilitate instruction fine-tuning. We evaluated our approach on two challenging data science benchmarks, Arcade and DS-1000. The results demonstrate a significant improvement in the LLM's ability to generate code that is not only executable but also accurately aligned with user specifications, substantially improving the quality of code generation for complex data science tasks.

grounding data science code generation, llm, specification, (13 more...)

arXiv.org Artificial Intelligence

2402.08073

Country:

Asia > India > Maharashtra > Mumbai (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
Asia > India > West Bengal > Kolkata (0.04)
(7 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment (1.00)
Transportation > Passenger (0.46)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

Improta, Cristina, Liguori, Pietro, Natella, Roberto, Cukic, Bojan, Cotroneo, Domenico

arXiv.org Artificial IntelligenceOct-1-2023

In this work, we present a method to add perturbations to the code descriptions to create new inputs in natural language (NL) from well-intentioned developers that diverge from the original ones due to the use of new words or because they miss part of them. The goal is to analyze how and to what extent perturbations affect the performance of AI code generators in the context of security-oriented code. First, we show that perturbed descriptions preserve the semantics of the original, non-perturbed ones. Then, we use the method to assess the robustness of three state-of-the-art code generators against the newly perturbed inputs, showing that the performance of these AI-based solutions is highly affected by perturbations in the NL descriptions. To enhance their robustness, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the NL descriptions in the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.

code description, nl intent, perturbation, (13 more...)

arXiv.org Artificial Intelligence

2306.05079

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Campania > Naples (0.04)
(11 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.95)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

DocPrompting: Generating Code by Retrieving the Docs

Zhou, Shuyan, Alon, Uri, Xu, Frank F., Wang, Zhiruo, Jiang, Zhengbao, Neubig, Graham

arXiv.org Artificial IntelligenceFeb-18-2023

Publicly available source-code libraries are continuously growing and changing. This makes it impossible for models of code to keep current with all available APIs by simply training these models on existing code repositories. Thus, existing models inherently cannot generalize to using unseen functions and libraries, because these would never appear in the training data. In contrast, when human programmers use functions and libraries for the first time, they frequently refer to textual resources such as code manuals and documentation, to explore and understand the available functionality. Inspired by this observation, we introduce DocPrompting: a natural-language-to-code generation approach that explicitly leverages documentation by (1) retrieving the relevant documentation pieces given an NL intent, and (2) generating code based on the NL intent and the retrieved documentation. DocPrompting is general: it can be applied to any programming language and is agnostic to the underlying neural model. We demonstrate that DocPrompting consistently improves NL-to-code models: DocPrompting improves strong base models such as CodeT5 by 2.85% in pass@1 (52% relative gain) and 4.39% in pass@10 (30% relative gain) in execution-based evaluation on the popular Python CoNaLa benchmark; on a new Bash dataset tldr, DocPrompting improves CodeT5 and GPT-Neo1.3B by up to absolute 6.9% exact match.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2207.05987

Country:

North America > Dominican Republic (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.47)

Add feedback

Natural Language to Code Generation in Interactive Data Science Notebooks

Yin, Pengcheng, Li, Wen-Ding, Xiao, Kefan, Rao, Abhishek, Wen, Yeming, Shi, Kensen, Howland, Joshua, Bailey, Paige, Catasta, Michele, Michalewski, Henryk, Polozov, Alex, Sutton, Charles

arXiv.org Artificial IntelligenceDec-19-2022

Computational notebooks, such as Jupyter notebooks, are interactive computing environments that are ubiquitous among data scientists to perform data wrangling and analytic tasks. To measure the performance of AI pair programmers that automatically synthesize programs for those tasks given natural language (NL) intents from users, we build ARCADE, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks. ARCADE features multiple rounds of NL-to-code problems from the same notebook. It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction. To establish a strong baseline on this challenging task, we develop PaChiNCo, a 62B code language model (LM) for Python computational notebooks, which significantly outperforms public code LMs. Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and NL explanation, showing the potential to improve the diversity and explainability of model predictions.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2212.09248

Country:

Asia > Middle East > Israel (0.04)
Asia > India > West Bengal > Kolkata (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)
(10 more...)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.61)

Add feedback